Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Jonathan Aizen Date: May 8, 2003 11:51am
Forum: web Subject: Re: Removal of Site- not from google

You should note that the Internet Archive is in no way affiliated with Google. I'm curious where you got that impression.

But in answer to your question, you are absolutely not being punished for having your site removed. As the site author that decision is up to you. It's unfortunate, in my opinion, that you've decided to exclude your site from this permanent Archive, one which our grandchildren's children will be able to use learn from, but doing so is your own decision, and your site is not being penalized for this decision.

This post was modified by Jonathan Aizen on 2003-05-08 18:51:44

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 8, 2003 1:35pm
Forum: web Subject: Re: Removal of Site- not from google



Thank you for your reponse. In response to your question about the Internet Archive and Google (are you saying that the Internet Archive is also not affiliated with Alexa too?), I would first like to direct your attention to the Alexa site:

http://pages.alexa.com/company/partners.html

You will see that they have "partnership" with the Internet Archives - as well as with Google. There are several other examples this partnership on the Alexa site.


Regarding a partnership with Google, as Alexa is partnered with both Archives.org and Google, it is logical to suggest that there is, if even in a roundabout manner, a link between Archives.org and Google. In support of this, please look at "The Council of Library and Information Resources" site at:

http://www.clir.org/pubs/reports/pub106/web.html

"Alternatively, consider the model of Alexa Internet and the Internet Archive. Alexa Internet is a for-profit corporation that measures the quality of Web pages by tracing consumers' use of the Web. These measurements are made using an enormous Web archive, built by Alexa Internet using Web "spiders" (robots or agents) that roam the Web copying everything they find, unless forbidden entry. In this model, commercial use provides a viable economic base for the creation of the Web archive; note that Yahoo!, Google, and other search engine companies have also built large Web archives for commercial purposes. Alexa Internet then turns over the Web archive to the nonprofit Internet Archive, which provides for long-term preservation of the digital archive."

When one enters the site, www.edethics.org into an Alexa search, you will see that the information listed there is very incomplete - almost as if the site didn't exist for most of the time it has been on the 'net. Interestingly, Alexa, unlike Google has done a rather large search of the site - infact, they download nearly 20 mgs of material about a month ago! The fact that the limited information is so incomplete also indicates some kind of problem with Alexa.






While I appreciate your response, and it is rather altruistic, it is far better for people to publish their family history instead (I have). Also, in the case of this site, we document very real evils in our public schools and the evil people (I mean that in the most literal sense of the term) who harm children and their teachers. While we do not post things that are not true, we do document the personal horrors inflicted upon wonderful teachers and their students - the site is filled with real stories of nervous breakdowns, numerous case of post-traumatic stress disorder, bankruptcies, and other horrible things caused by sick school administrators and corrupt teachers' unions. Sometimes, the teachers who have posted their stories there ask for them to be edited or removed - we honor all such requests - even though these stories are completely true. These teachers certainly don't want to revist their horrors on archive.org when they have requested us to delete their story.

Any very important issue is the fact that the evil school districts and corrupt unions that hate us (we're glad) because we tell the truth have actually used (in fact printed out every page on the site) as "evidence" to continue their vile attacks on inicent people. The last time that happened, a sleezy school district and their shysters (again, that's a literal term) tried to retaliate against the teacher by trying to take away his credential to teach - the teacher had previous (and successfully) brought down that evil place down when he successfully prevailed in a US Dept. of Education, Office for Civil Rights case against the district for their intentional racist practices and harm to children with severe needs. Even though those phony charges were dropped and the school district was OCR determined they retaliated against him. The last thing we want is for deep pocket district's and their shysters is copying complete sets of our site, and archives of that site to further harm innocent people.

As there is a connection between Archives.org and Alexa (who is also partnered with Google) I remain completely preplexed as to the sudden (and significant) drop of the site and plunge in the number of visitors.

Thanks.

Reply to this post
Reply [edit]

Poster: Jonathan Aizen Date: May 8, 2003 1:45pm
Forum: web Subject: Re: Removal of Site- not from google

The partnership with Alexa is exactly as you quoted: Alexa donates crawls of the Web for the Wayback Machine. As far as I know, IA has no ties to Google, despite Google's amazing service.

I appreciate your concerns about your not wanting people to be able to easily rip you website's contents from an Archive such as this. I won't try to argue against it, and again that's why you are given the choice to have your contents excluded.

There is no conspiracy here - your voluntary exclusion from the Internet Archive would have no direct effect on your rankings in Google or Alexa. It might, as Diana suggested, be a side effect, but to me even that seems unlikely.

Do you use a robots.txt file to prevent crawler's like Alexa's from crawling your site?

This post was modified by Jonathan Aizen on 2003-05-08 20:45:37

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 9, 2003 4:15pm
Forum: web Subject: Re: Removal of Site- not from google

Hi Jonathan,

Thanks for your response. I'm glad that nothing at the Wayback Machine is intentionally blocking our site. The problem, however, remains and I'm at a complete loss as to where to turn next.

A robot (the one suggested on the Wayback Machine site) was used for a short period of time - right after I discovered that it was being archived. This was done because I couldn't get the site removed for several weeks. I believe that the Wayback Machine site says that the use of the robot would only impact archiving of the site and not affect other crawls). It was after that, however, that the hits to the site started to plunge dramtically (we've reached as low as 25% of our usual daily total lately). The robot was removed as soon as the site was removed from the archives - does Alexa recognize the same robot.txt? If that's the case, it might explain all of the strange things that have occured over these past few months.

Thanks

Reply to this post
Reply [edit]

Poster: Jonathan Aizen Date: May 10, 2003 4:19am
Forum: web Subject: Re: Removal of Site- not from google

So let's see if we can figure out what is going on. You should know that having a robots.txt file is going to block your site from being indexed by Wayback, Alexa, Google, and all other, respectful, search engines and crawlers. I'm a bit confused about your robots.txt situation - you say you had one but then deleted it? If you had one or still do, that is a good candidate for why your site has had its recent Google issues.

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 10, 2003 5:45am
Forum: web Subject: Re: Removal of Site- not from google



Interesting but that's not what is stated on the instructions for removing sites from the Wayback Machine. It staes at

http://www.archive.org/about/exclude.php

that

*********************

"The Internet Archive is not interested in offering access to Web sites or other Internet documents whose authors do not want their materials in the collection. To remove your site from the Wayback Machine, place a robots.txt file at the top level of your site (e.g. www.yourdomain.com/robots.txt) and then submit your site below.?

The robots.txt file will do two things:

It will remove all documents from your domain from the Wayback Machine.
It will tell us not to crawl your site in the future.

To exclude the Internet Archive's crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:

User-agent: ia_archiver
Disallow: /

********************

While the robot.txt was removed from the site and I used the instructions found here to construct the robot.txt file found on this site, apparently, the instructions here are not correct as it will also prevent crawls from Alexa, Google (and, it also appears, from Yahoo!). Is that right?

Interestingly, the site still maintains a top listing at:
altavista.com
askjeeves.com
dogpile.com
mamma.com
msn search
teoma.com

As our site's visitors have historically come from searches on Google (and, to a lesser extent, Yahoo!), if the robot.txt file listed above blocks sites from appearing their crawls that would explain the huge plunge in visits to our site since the robot.txt file was placed on the site. As I mentioned previously, that file was removed awhile ago.

Reply to this post
Reply [edit]

Poster: Igor Ranitovic Date: May 13, 2003 2:51am
Forum: web Subject: Re: Removal of Site- not from google

Your robots.txt file is blocking ONLY Alexa's crawler(user-agent:ia_archiver)

If I understand correctly that is what you want. You do not want your site to be archived. Right?

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 13, 2003 8:47am
Forum: web Subject: Re: Removal of Site- not from google

Actually, the robot.txt file hasn't been on my site for a long time (well over a month now). I don't want the site archived however, I do want people to find it and therefore, it should be on Alexa. Another VERY strange thing is that none of my sites (I currently have 4 up) appear on Alexa - even though I only put the robot.txt file on one of the sites.

Reply to this post
Reply [edit]

Poster: Igor Ranitovic Date: May 13, 2003 8:55am
Forum: web Subject: Re: Removal of Site- not from google

Check http://www.edethics.org/robots.txt

This file exists and it is blocking Alexa's crawler. Please check with your webmaster.

On the other hand, I am confused about what you really want. So I believe that the best thing for you is to give us a call. This way we can clear things out in the most productive way.

Take care,
Igor.


Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 13, 2003 9:20am
Forum: web Subject: Re: Removal of Site- not from google

UGH!!!!!! I don't know why the robots txt is there - I'm the webmaster and had removed the thing!!! I'll go remove it again!!!! That would explain a LOT.

Anyway, I don't want the site to be archived - but I DO want it picked up by Alexa, Google, etc. Hits to the site have PLUNGED since all of this started.

Also, I host 4 different sites on the account that this site is located on - none of them appear, however, on Alexa or in Google searches - would the robot.txt file prevent that from happening too?

Thanks for your help.

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 13, 2003 11:54am
Forum: web Subject: Re: Removal of Site- not from google

That's it! Just removed the robot.txt file from the site (I had removed it once before). Suddenly, our site is back at #1 on Google. Wow - the robot text file should NOT have caused a problem with Google but now I'm CERTAIN that it did. Anyway, problem solved!

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 13, 2003 5:24pm
Forum: web Subject: Re: Removal of Site- not from google

WHAT is GOING ON!!!!! After several months of falling off of Google's charts, our site suddenly appeared in the NUMBER ONE position this afternoon - I know because I personally saw it there!!!! Just check again, and it's disappeared again (actually, on page 11) AND typing link:www.edethics.org displays NOTHING - again?!?!?!!?!? I do know that it appeared RIGHT after the robot.txt file was removed (again). This is VERY strange. Any ideas????

Reply to this post
Reply [edit]

Poster: PcDevils Date: Nov 26, 2003 9:44am
Forum: web Subject: Re: Removal of Site- not from google

did you ever consider it may because your site sucks?

The less links and relevent searches being made are morelikley to be the cause of the drop in google. That and people must not be searching for that exact topic asmuch Get over it and stop blaming the archive for a coincidence.

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: Nov 26, 2003 10:37am
Forum: web Subject: Re: Removal of Site- not from google

"did you ever consider it may because your site sucks?"

Nope. Only your own comments do.

"The less links and relevent searches being made are morelikley to be the cause of the drop in google. That and people must not be searching for that exact topic asmuch Get over it and stop blaming the archive for a coincidence."

Your ignorance is astounding. First of all, you're replying to a post that was made a LONG time ago. Second, the matter WAS related to the archive. Third, as the problem was resolved, MONTHS ago, the site is back at the top.

Grow up.

This post was modified by Concerned with Copyright on 2003-11-26 18:37:25

Reply to this post
Reply [edit]

Poster: Jonathan Aizen Date: May 10, 2003 7:23am
Forum: web Subject: Re: Removal of Site- not from google

Aha,

If you put: User-agent: ia_archiver in your robots.txt file then Google should still be crawling your site (and is, as shown by Diana). I don't know about Alexa - it's possible that Alexa's crawler has user agent ia_archiver, but that seems unlikely.

Anyway, all I can tell you for sure is that your site is not being intentionally punished. If you want to know why your site isn't coming up as it used to Google, I suggest you contact them. Best of luck and I'd be interested to hear what they have to tell you.

This post was modified by Jonathan Aizen on 2003-05-10 14:23:47

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 10, 2003 7:29am
Forum: web Subject: Re: Removal of Site- not from google

If I hear anything from Google, I'll also post here. The last I thng I heard from them was that they were "looking into the problem."